Statistical Machine Translation of French and German into English Using IBM Model 2 Greedy Decoding
نویسنده
چکیده
The job of a decoder in statistical machine translation is to find the most probable translation of a given sentence, as defined by a set of previously learned parameters. Because the search space of potential translations is essentially infinite, there is always a trade-off between accuracy and speed when designing a decoder. Germann et al. [4] recently presented a fast, greedy decoder that starts with an initial guess and then refines that guess through small “mutations” that produce more probable translations. The greedy decoder in [4] was designed to work with the IBM Model 4 translation model, which, while being a sophisticated model of the translation process, is also quite complex and therefore difficult to implement and fairly slow in training and decoding. We present modifications to the greedy decoder presented in [4] that allow it to work with the simpler and more efficient IBM Model 2. We have tested our modified decoder by having it translate equivalent French and German sentences into English, and we present the results and translation accuracies that we have obtained. Because we are interested in the relative effectiveness of our decoder in translating between different languages, we discuss the discrepancies between the results we obtained when performing French-to-English and Germanto-English translation, and we speculate on the factors inherent to these languages that may have contributed to these discrepancies.
منابع مشابه
A new model for persian multi-part words edition based on statistical machine translation
Multi-part words in English language are hyphenated and hyphen is used to separate different parts. Persian language consists of multi-part words as well. Based on Persian morphology, half-space character is needed to separate parts of multi-part words where in many cases people incorrectly use space character instead of half-space character. This common incorrectly use of space leads to some s...
متن کاملA New Decoding Algorithm for Statistical Machine Translation: Design and Implementation
We describe a new algorithm for the Decoding problem in Statistical Machine Translation. Our algorithm is based on the Alternating Optimization framework and employs dynamic programming. The time complexity of the algorithm is O m , where m is the length of the sentence to be translated, which is the best among all known algorithms for the problem. As the search space explored by the algorithm ...
متن کاملSquibs and Discussions: Decoding Complexity in Word-Replacement Translation Models
Statistical machine translation is a relatively new approach to the long-standing problem of translating human languages by computer. Current statistical techniques uncover translation rules from bilingual training texts and use those rules to translate new texts. The general architecture is the source-channel model: an English string is statistically generated (source), then statistically tran...
متن کاملDecoding Complexity in Word-Replacement Translation Models
Statistical machine translation is a relatively new approach to the longstanding problem of translating human languages by computer Current statistical techniques uncover trans lation rules from bilingual training texts and use those rules to translate new texts The general architecture is the source channel model an English string is statistically gener ated source then statistically transform...
متن کاملStanford University's Submissions to the WMT 2014 Translation Task
We describe Stanford’s participation in the French-English and English-German tracks of the 2014 Workshop on Statistical Machine Translation (WMT). Our systems used large feature sets, word classes, and an optional unconstrained language model. Among constrained systems, ours performed the best according to uncased BLEU: 36.0% for French-English and 20.9% for English-German.
متن کامل